Multiview Semi-supervised Learning for Ranking Multilingual Documents

نویسندگان

  • Nicolas Usunier
  • Massih-Reza Amini
  • Cyril Goutte
چکیده

We address the problem of learning to rank documents in a multilingual context, when reference ranking information is only partially available. We propose a multiview learning approach to this semisupervised ranking task, where the translation of a document in a given language is considered as a view of the document. Although both multiview and semi-supervised learning of classifiers have been studied extensively in recent years, their application to the problem of ranking has received much less attention. We describe a semi-supervised multiview ranking algorithm that exploits a global agreement between viewspecific ranking functions on a set of unlabeled observations. We show that our proposed algorithm achieves significant improvements over both semi-supervised multiview classification and semi-supervised single-view rankers on a large multilingual collection of Reuters news covering 5 languages. Our experiments also suggest that our approach is most effective when few labeled documents are available and the classes are imbalanced.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-view Exploratory Learning for AKBC Problems

In this paper, we argue that many Automatic Knowledge Base Construction (AKBC) tasks which have previously been addressed separately can be viewed as instances of single abstract problem: multiview semi-supervised learning with an incomplete class hierarchy. We also present a general EM framework for solving this abstract task, and summarize past work on various special cases of multiview semi-...

متن کامل

Clustering multilingual documents by estimating text - to - text semantic relatedness

This thesis is about multilingual document clustering through estimating semantic relatedness between multilingual texts. Specifically we focus on the task of clustering multilingual documents with very limited or no supervisory information. We present two approaches to address the problem : a comparable-corpora based approach and a web-searches based approach. Our first approach derives pairwi...

متن کامل

Information Retrieval Using Label Propagation Based Ranking

The IR group participated in the crosslanguage retrieval task (CLIR) at the sixth NTCIR workshop (NTCIR 6). In this paper, we describe our approach on Chinese Single Language Information Retrieval (SLIR) task and English-Chinese Bilingual CLIR task (BLIR). We use both bi-grams and single Chinese characters as index units and use OKAPI BM25 as retrieval model. The initial retrieved documents are...

متن کامل

Learning Preferences with Co-Regularized Least-Squares

Situations when only a limited amount of labeled data and a large amount of unlabeled data is available to the learning algorithm are typical for many real-world problems. In this paper, we propose a semi-supervised preference learning algorithm that is based on the multiview approach. Multi-view learning algorithms operate by constructing a predictor for each view and by choosing such predicti...

متن کامل

Semi-supervised document retrieval

0306-4573/$ see front matter 2008 Elsevier Ltd doi:10.1016/j.ipm.2008.11.002 * Corresponding author. Tel./fax: +86 25 8368 62 E-mail address: [email protected] (Z.-H. Zhou) This paper proposes a new machine learning method for constructing ranking models in document retrieval. The method, which is referred to as SSRANK, aims to use the advantages of both the traditional Information Retrieval (I...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011